Transliteration based Search Engine for Multilingual Information Access

نویسندگان

  • Anand Arokia Raj
  • Harikrishna Maganti
چکیده

Most of the Internet data for Indian languages exist in various encodings, causing difficulties in searching for the information through search engines. In the Indian scenario, majority web pages are not searchable or the intended information is not efficiently retrieved by the search engines due to the following: (1) Multiple text-encodings are used while authoring websites. (2) Inspite of Indian languages sharing common phonetic nature, common words like loan words (borrowed from other languages like Sanskrit, Urdu or English), transliterated terms, pronouns etc., can not be searched across languages. (3) Finally the query input mechanism is another major problem. Most of the users hardly know how to type in their native language and prefer to access the information through English based transliteration. This paper addresses all these problems and presents a transliteration based search engine (inSearch) which is capable of searching 10 multi-script and multiencoded Indian languages content on the web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern Multilingual and Cross-lingual Information Access Technologies

In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...

متن کامل

Combining probability models and web mining models: a framework for proper name transliteration

The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research...

متن کامل

Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs

This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method employs a trainable edit distance function to find pairs that have a high probability of being equivalent. These pairs can then be used to further bootstrap training of the edit distance function, ...

متن کامل

The MultiMatch Prototype: Multilingual/Multimedia Search for Cultural Heritage Objects

MultiMatch is a 30 month targeted research project under the Sixth Framework Programme, supported by the unit for Content, Learning and Cultural Heritage (Digicult) of the Information Society DG. MultiMatch is developing a multimedia/multilingual search engine designed specifically for the access, organization and personalized presentation of cultural heritage information. The demonstration wil...

متن کامل

Multilingual Information Retrieval in World Wide Web

The article addresses: (1). The design of an information retrieval (IR), as the Multilingual Information Retrieval Tool Hierarchy (MIRTH), which with virtual corpora on the World Wide Web, also known as Web or WWW. It is motivated by the desire to create a search engine to retrieve information by accessing a virtual. (2). The implementation of a general model of multilingual retrieval for the W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009